Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output
نویسنده
چکیده
We describe Hjerson, a tool for automatic classification of errors in machine translation output. The tool features the detection of five word level error classes: morphological errors, reodering errors, missing words, extra words and lexical errors. As input, the tool requires original full form reference translation(s) and hypothesis along with their corresponding base forms. It is also possible to use additional information on the word level (e.g. tags) in order to obtain more details. The tool provides the raw count and the normalised score (error rate) for each error class at the document level and at the sentence level, as well as original reference and hypothesis words labelled with the corresponding error class in text and formats. 1. Motivation Human error classification and analysis of machine translation output presented in (Vilar et al., 2006) have become widely used in recent years in order to get detailed answers about strengths and weaknesses of a translation system. Another types of human error analysis have also been carried out, e.g. (Farrús et al., 2009) suitable for the Spanish and Catalan languages. However, human error classification is a difficult and time consuming task, and automatic methods are needed. Hjerson is a tool for automatic error classification which systematically covers the main word level error categories defined in (Vilar et al., 2006): morphological (inflectional) errors, reordering errors, missing words, extra words and lexical errors. It implements the method based on the standard word error rate () combined with the precision and recall based error rates (Popović and Ney, 2007) and it has been © 2011 PBML. All rights reserved. Corresponding author: [email protected] Cite as: Maja Popović. Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics No. 96, 2011, pp. 59–67. doi: 10.2478/v10108-011-0011-4. PBML 96 OCTOBER 2011 tested on various language pairs and tasks. It is shown that the obtained results have high correlation (between 0.6 and 1.0) with the results obtained by human evaluators (Popović and Burchardt, 2011; Popović and Ney, 2011). The tool is written in Python, and is available under an open-source licence. We hope that the release of the toolkit will facilitate the error analysis and classification for the researchers, and also stimulate further development of the proposed method.
منابع مشابه
Automatic MT Error Analysis: Hjerson Helping Addicter
We present a complex, open source tool for detailed machine translation error analysis providing the user with automatic error detection and classification, several monolingual alignment algorithms as well as with training and test corpus browsing. The tool is the result of a merge of automatic error detection and classification of Hjerson (Popović, 2011) and Addicter (Zeman et al., 2011) into ...
متن کاملAppraise: an Open-Source Toolkit for Manual Evaluation of MT Output
We describe Appraise, an open-source toolkit supporting manual evaluation of machine translation output. The system allows to collect human judgments on translation output, implementing annotation tasks such as 1) quality checking, 2) translation ranking, 3) error classification, and 4) manual post-editing. It features an extensible, XML-based format for import/ export and can easily be adapted...
متن کاملBlast: A Tool for Error Analysis of Machine Translation Output
We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part of MT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical u...
متن کاملFrom Human to Automatic Error Classification for Machine Translation Output
Future improvement of machine translation systems requires reliable automatic evaluation and error classification measures to avoid time and money consuming human classification. In this article, we propose a new method for automatic error classification and systematically compare its results to those obtained by humans. We show that the proposed automatic measures correlate well with human jud...
متن کاملError Analysis of Statistical Machine Translation Output
Evaluation of automatic translation output is a difficult task. Several performance measures like Word Error Rate, Position Independent Word Error Rate and the BLEU and NIST scores are widely use and provide a useful tool for comparing different systems and to evaluate improvements within a system. However the interpretation of all of these measures is not at all clear, and the identification o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 96 شماره
صفحات -
تاریخ انتشار 2011